Method and apparatus for processing video, and storage medium

ABSTRACT

A method for processing a video includes: identifying a target object in a first video segment; acquiring a current video frame of a second video segment; acquiring a first image region corresponding to the target object in a first target video frame of the first video segment, and acquiring a second image region corresponding to the target object in the current video frame of the second video segment, wherein the first target video frame corresponds to the current video frame of the second video segment in terms of video frame time; and performing picture splicing on the first target video frame and the current video frame of the second video segment according to the first image region and the second image region to obtain a processed first video frame.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based upon and claims priority to Chinese PatentApplication No. 202010345830.3, filed on Apr. 27, 2020, the content ofwhich is hereby incorporated by reference in its entirety.

TECHNICAL FIELD

The present disclosure relates to the technical field of computers, andmore particularly, to a method and apparatus for processing a video, anda storage medium.

BACKGROUND

As a relatively common video effect technology, the clone effect ofvideos is manifested in that multiple same objects (such as persons andobjects) appear simultaneously in a same scenario of the video (i.e., aframe of image of the video). For example, the clone effect for a personA may be manifested in that multiple persons A appear in a same scenarioof the video, and each person A may have the consistent action, and mayalso have different actions. When there is a need to obtain a video withthe clone effect, generally, several original video segments are firstphotographed to serve as source materials; then, the photographed sourcematerials are spliced by the professionals with the use of a videopost-editing tool; and at last, several cloned bodies of the same objectappear in the same scenario of the video simultaneously. In the abovemanner, the source materials are first photographed and then the cloneeffect is produced subsequently, such that the production period islong. During photographing, a feedback may not be obtained in time.Furthermore, the photographing of source materials needs to be plannedin advance (for example, selection of scenarios, determination of viewfinding positions, etc.), which results in a tedious preparationprocess; and the source materials are subsequently processed by theprofessionals having specialized knowledge, thus there are highrequirements on the specialized knowledge, and great difficulties forimplementing the clone effect.

SUMMARY

According to a first aspect of the embodiments of the presentdisclosure, a method for processing a video, applied to a terminaldevice, may include: identifying a target object in a first videosegment; acquiring a current video frame of a second video segment;acquiring a first image region corresponding to the target object in afirst target video frame of the first video segment, and acquiring asecond image region corresponding to the target object in the currentvideo frame of the second video segment, wherein the first target videoframe corresponds to the current video frame of the second video segmentin terms of video frame time; and performing picture splicing on thefirst target video frame and the current video frame of the second videosegment according to the first image region and the second image regionto obtain a processed first video frame.

According to a second aspect of the embodiments of the presentdisclosure, an apparatus for processing a video may include: aprocessor; and a memory configured to store instructions executable bythe processor. The processor is configured to: identify a target objectin a first video segment; acquire a current video frame of a secondvideo segment; acquire a first image region corresponding to the targetobject in a first target video frame of the first video segment, andacquire a second image region corresponding to the target object in thecurrent video frame of the second video segment, the first target videoframe corresponding to the current video frame of the second videosegment in terms of video frame time; and perform picture splicing onthe first target video frame and the current video frame of the secondvideo segment according to the first image region and the second imageregion to obtain a processed first video frame.

According to a third aspect of the embodiments of the presentdisclosure, a non-transitory computer-readable storage medium has storedthereon instructions that, when executed by a processor of a device,cause the device to perform a method for processing a video, the methodincluding: identifying a target object in a first video segment;acquiring a current video frame of a second video segment; acquiring afirst image region corresponding to the target object in a first targetvideo frame of the first video segment, and acquiring a second imageregion corresponding to the target object in the current video frame ofthe second video segment, wherein the first target video framecorresponds to the current video frame of the second video segment interms of video frame time; and performing picture splicing on the firsttarget video frame and the current video frame of the second videosegment according to the first image region and the second image regionto obtain a processed first video frame.

It is to be understood that the above general description and detaileddescription below are only exemplary and explanatory and not intended tolimit the present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute apart of this specification, illustrate embodiments consistent with thepresent disclosure and, together with the description, serve to explainthe principles of the present disclosure.

FIG. 1 is a flowchart of a method for processing a video according to anexemplary embodiment.

FIG. 2 shows a result for identifying a target object in a method forprocessing a video according to an exemplary embodiment.

FIG. 3 is a flowchart of an operation for performing picture splicingaccording to an exemplary embodiment.

FIG. 4 is a flowchart of a method for processing a video according toanother exemplary embodiment.

FIG. 5A-FIG. 5C are schematic diagrams of an interface of a terminaldevice in an implementation process of a method for processing a video,according to an exemplary embodiment.

FIG. 6 is a block diagram of an apparatus for processing a videoaccording to an exemplary embodiment.

FIG. 7 is a block diagram of an apparatus for processing a videoaccording to an exemplary embodiment.

DETAILED DESCRIPTION

Reference will now be made in detail to exemplary embodiments, examplesof which are illustrated in the accompanying drawings. The followingdescription refers to the accompanying drawings in which the samenumbers in different drawings represent the same or similar elementsunless otherwise represented. The implementations set forth in thefollowing description of exemplary embodiments do not represent allimplementations consistent with the present disclosure. Instead, theyare merely examples of apparatuses and methods consistent with aspectsrelated to the present disclosure as recited in the appended claims.

FIG. 1 is a flowchart of a method for processing a video according to anexemplary embodiment. The method may be applied to a terminal device.The terminal device may be, for example, a mobile phone, a computer, amessaging device, a tablet device, a Personal Digital Assistant (PDA),and the like. As shown in FIG. 1, the method may include the followingoperations.

In operation 11, a target object in a first video segment is identified.

In operation 12, a current video frame of a second video segment isacquired.

In operation 13, a first image region corresponding to the target objectin a first target video frame of the first video segment is acquired,and a second image region corresponding to the target object in thecurrent video frame of the second video segment is acquired.

In operation 14, picture splicing is performed on the first target videoframe and the current video frame of the second video segment accordingto the first image region and the second image region to obtain aprocessed first video frame.

In the embodiment, based on the two existing segments of videos, for twovideo frames having the corresponding time in the two segments ofvideos, a picture portion including the target object in one video frameis spliced with a picture portion including the target object in theother video frame to obtain the processed first video frame; andtherefore, based on such a first video frame, a spliced video having theclone effect can be quickly obtained in the subsequent process, thecomplex manual post edition by a user turns out to be unnecessary, andthe processing efficiency is high.

In an embodiment, the target object may be a living creature (such as aperson, an animal, and a plant), and may also be a lifeless object (suchas a disk and a computer). The method may implement the clone effect ofthe target object, i.e., to enable the processed video to include, in asame scenario, at least two pictures of the target object photographedby a terminal device at different time.

In an embodiment, the first video segment is a segment of videophotographed by the terminal device and including the target object. Thetarget object may be photographed by the terminal device to obtain thefirst video segment. For example, if the clone effect of the person Aneeds to be produced, the person A is the target object, and the usermay photograph the person A by operating the terminal device so that thepicture includes the person A and the background that is not the personA; and upon the completion of photographing, the first video segment maybe obtained.

In an embodiment, the first video segment may be obtained through thefollowing manner: in response to a first video segment photographinginstruction, a video stream acquired in real time is recorded until afirst video segment photographing stop instruction is received; and therecorded video stream is taken as the first video segment.

The first video segment photographing instruction is configured toinstruct to photograph the first video segment, the first video segmentphotographing stop instruction is configured to instruct to end thephotographing at the present time, and the video stream is a video framecaptured by the terminal device, such as a viewing frame of the terminaldevice, in real time. Hence, the first video segment is the video streamrecorded by the terminal device during the period from the reception ofthe first video segment photographing instruction to the reception ofthe first video segment photographing stop instruction.

The first video segment photographing instruction and the first videosegment photographing stop instruction may be generated by the operationof the user on the terminal device. For example, the terminal device maybe provided with a photographing start button, such as a physical buttonor a virtual button, for photographing the first video segment. If theuser clicks the button, the first video segment photographinginstruction is generated correspondingly. The terminal device may beprovided with a photographing stop button for instructing to stopphotographing the first video segment. If the user clicks the button,the first video segment photographing stop instruction is generatedcorrespondingly. The photographing start button and the photographingstop button may be the same button, and may also be different buttons.Also for example, the terminal device may be provided with a pressregion, such as a region on a screen of the terminal device or a regionon a body of the terminal device, for photographing the first videosegment. If the user presses the region, the first video segmentphotographing instruction is generated correspondingly. If the user nolonger presses the region, for example, a finger of the user changesfrom a state of pressing the region into a state of lifting from thepressed region, the first video segment photographing stop instructionis generated. In other words, the pressing operation means tophotograph, and the releasing operation means to stop photographing.

In an embodiment, the first video segment photographing stop instructionmay be generated upon the first video segment photographing instruction.For example, a photographing duration may be set in advance; upondetecting the first video segment photographing instruction, the time iscounted; and when the counted duration reaches the photographingduration, the first video segment photographing stop instruction isgenerated. In such a scenario, the duration of the first video segmentis equal to the photographing duration set in advance.

Identifying the target object in the first video segment in operation 11may be implemented by identifying the target object in a video frameincluded in the first video segment. In an embodiment, operation 11 mayinclude the following operation: according to a third video frame,pixels corresponding to the target object in the third video frame aredetermined through a target object identification model.

The third video frame is a frame of video in the first video segment.The target object identification model may identify whether each pixelin the image belongs to the target object. In an embodiment, after thethird video frame is input to the target object identification model, anoutput result as shown in FIG. 2 may be obtained. The white portion inFIG. 2 indicates the target object, and the black portion in FIG. 2indicates the non-target object.

In an embodiment, the target object identification model may be obtainedthrough the following manners: training data is acquired, each trainingdata including a historical image and labeling information indicatingwhether each pixel in the historical image belongs to the target object,and an image segmentation model is trained according to the trainingdata to obtain the target object identification model.

In an embodiment, the image segmentation model belongs to a neutralnetwork model. During training each time, the historical image is takenas input data of the image segmentation model, and labeling informationcorresponding to the historical image is taken as a true output of themodel, so as to adjust parameters in the model. Through multiple timesof training, when a model training stop condition is met, the obtainedmodel is used as the target object identification model.

By determining the pixels corresponding to the target object, theposition of the target object in the third video frame may bepositioned. Additionally, an image region corresponding to the targetobject in the third image frame can further be intactly extracted fromthe third video frame according to these pixels corresponding to thetarget object.

In an embodiment, to identify the target object in the first videosegment, the target object may be identified in each video frame of thefirst video segment; and by viewing each video frame in the first videosegment as the third video frame, the above operation is executed.Furthermore, as mentioned above, by identifying the target object in thefirst video segment, the position of the target object in each videoframe of the first video segment can be positioned, and the image regioncorresponding to the target object in each video frame of the firstvideo segment can further be extracted intactly.

As described above, in operation 12, a current video frame of a secondvideo segment is acquired.

In an embodiment, the second video segment is a segment of videophotographed by the terminal device and including the target object. Inthe embodiment, the method may splice the target objects in the firstand second video segments to the same picture correspondingly. Inanother embodiment, the second video segment may be obtained through thefollowing manner: in response to a video photographing instruction, avideo stream acquired in real time is recorded until a videophotographing stop instruction is received, and the recorded videostream is taken as the second video segment.

The video photographing instruction is configured to instruct tophotograph the second video segment, and the video photographing stopinstruction is configured to indicate to end the photographing of thesecond video segment at this time. The second video segment is the videostream recorded by the terminal device in the period from the receptionof the video photographing instruction to the reception of the videophotographing stop instruction.

The video photographing instruction and the video photographing stopinstruction may be generated by the operation of the user on theterminal device. For example, the terminal device may be provided with aphotographing start button, such as a physical button or a virtualbutton, for photographing the second video segment. If the user clicksthe button, the video photographing instruction is generatedcorrespondingly. The terminal device may be provided with aphotographing stop button for instructing to stop photographing thesecond video segment. If the user clicks the button, the videophotographing stop instruction is generated correspondingly. Thephotographing start button and the photographing stop button may be thesame button, and may also be different buttons. Also for example, theterminal device may be provided with a press region, such as a region ona screen of the terminal device or a region on a body of the terminaldevice, for photographing the second video segment. If the user pressesthe region, the video photographing instruction is generatedcorrespondingly. If the user no longer presses the region, for example,a finger of the user changes from a state of pressing the region into astate of lifting from the pressed region, the video photographing stopinstruction is generated. In other words, the pressing operation meansto photograph, and the releasing operation means to stop photographing.

In an embodiment, the video photographing stop instruction may begenerated upon the video photographing instruction. For example, aphotographing duration may be set in advance; upon detecting the videophotographing instruction, the time is counted; and when the countedduration reaches the photographing duration, the video photographingstop instruction is generated. In such a scenario, the duration of thesecond video segment is equal to the photographing duration set inadvance.

Each video frame in the second video segment may serve as the currentvideo frame of the second video segment.

As described above, in operation 13, a first image region correspondingto the target object in a first target video frame of the first videosegment is acquired, and a second image region corresponding to thetarget object in the current video frame of the second video segment isacquired.

The first target video frame of the first video segment corresponds tothe current video frame of the second video segment in terms of videoframe time.

It is to be noted that the time correspondence herein does not refer tothat the timestamps are consistent, but refers to that the first videosegment and the second video segment are in a corresponding relationshipin time sequence. The corresponding relationship may be that an Nthvideo frame of the first video segment corresponds to an Mth video frameof the second video segment, and M and N may be the same, and may alsobe different.

By identifying the target object in the first video segment (i.e.,identifying the pixel in each video frame of the first video segmentthat belongs to the target object) in operation 11, the position of thetarget object in each video frame of the first video segment can bepositioned, and an image region corresponding to the target object ineach video frame of the first video segment can further be extractedintactly. Therefore, by means of operation 11, the target object can beidentified from the first target video frame of the first video segment;and thus, the first image region corresponding to the target object inthe first target video frame can be acquired.

Referring to the above method for identifying the target object, basedon the same principle, the target object in the current video frame ofthe second video segment may be identified; and thus, the position ofthe target object in the current video frame of the second video segmentmay be identified based on the identification result, and an imageregion corresponding to the target object in the current video frame ofthe second video segment can be extracted intactly, i.e., the secondimage region corresponding to the target object in the current videoframe of the second video segment can be acquired.

As described above, in operation 14, picture splicing is performed onthe first target video frame and the current video frame of the secondvideo segment according to the first image region and the second imageregion to obtain a processed first video frame.

Accordingly, the obtained first video frame not only includes an imagecontent (i.e., the second image region) corresponding to the targetobject in the current video frame of the second video segment, but alsoincludes an image content (i.e., the first image region) correspondingto the target object in the first target video frame of the first videosegment, thus a new image generated by the picture splicing and havingthe clone effect is obtained.

In an embodiment, as shown in FIG. 3, operation 14 may include thefollowing operations.

In operation 31, an image splicing boundary is determined using an imagesplicing algorithm according to the first image region and the secondimage region.

The image splicing algorithm may be an existing image splicingalgorithm, an image fusion algorithm, or the like. For two or multipleimages, after a portion in need of being kept in each picture isdetermined, pixels suitable for being taken as a splicing boundary andspliced with other images in each image can be determined by the imagesplicing algorithm, and multiple pixels can form the splicing boundary;and therefore, upon the determination that the first image region andthe second image region need to be kept, the image splicing boundary canbe directly determined by the image splicing algorithm according to thefirst target video frame of the first video frame, the first imageregion, the current video frame of the second video segment and thesecond image region.

In operation 32, according to the image splicing boundary, a first localimage including the first image region is acquired from the first targetvideo frame, and a second local image including the second image regionis acquired from the current video frame of the second video segment.

For example, with the image splicing boundary as a dividing line, allpixels located on the same side of the image splicing boundary with thefirst image region are acquired from the first target video segment toserve as the first local image. Also for example, besides all pixelslocated on the same side of the image splicing boundary with the firstimage region in the first target video frame, the first local imagefurther includes a part or all of pixels located at the image splicingboundary.

For example, with the image splicing boundary as a dividing line, allpixels located on the same side of the image splicing boundary with thesecond image region are acquired from the current video frame of thesecond video segment to serve as the second local image. Also forexample, besides all pixels located on the same side of the imagesplicing boundary with the second image region in the current videoframe of the second video segment, the second local image furtherincludes a part or all of pixels located at the image splicing boundary.

The first local image and the second local image may form an imagehaving the consistent size with the original video frame.

In operation 33, the first local image and the second local image arespliced into the first video frame.

The first local image and the second local image obtained in operation32 may be directly spliced to obtain a new image to serve as theprocessed first video frame.

In an embodiment, the user may first photograph the video by theterminal device to obtain the first video segment; and upon thecompletion of the first video photographing, the user may continue tophotograph the video by the terminal device for a second time to obtainthe second video segment. Thereafter, based on two video framescorresponding to each other in the first video segment and the secondvideo segment, operation 11 to operation 14 are executed to obtain theprocessed first video frame; and the first video frame has the cloneeffect for the target object.

In the embodiment, the target object in the first video segment isidentified; the current video frame of the second video segment isacquired; the first image region corresponding to the target object inthe first target video frame of the first video segment is acquired, andthe second image region corresponding to the target object in thecurrent video frame of the second video segment is acquired; and thepicture splicing is performed on the first target video frame and thecurrent video frame of the second video segment according to the firstimage region and the second image region to obtain the processed firstvideo frame. The first target video frame and the current video frame ofthe second video segment are corresponding in terms of video frame time.In this way, based on the two existing segments of videos, for two videoframes having the corresponding time in the two segments of videos, apicture portion including the target object in one video frame isspliced with a picture portion including the target object in the othervideo frame to obtain the processed first video frame; and therefore,upon such a first video frame, a spliced video having the clone effectcan be quickly obtained in the subsequent process, the complex manualpost edition by a user turns out to be unnecessary, and the processingefficiency is high.

The first video segment and the second video segment may not becompletely consistent in photographing angle, photographing position andphotographing method to result in position transformation of the picturebetween the video frame of the first video segment and the video frameof the second video segment. The related position transformation mayinclude, but not limited to, at least one of the followings:translation, rotation, stretch, zoom-in, zoom-out and distortion. As aconsequence, in order to make the picture content in the spliced firstvideo frame more harmonious, and avoid the overlarge position differencebetween the target objects on the same picture (for example, the targetobject is located on the ground during photographing; and as theterminal device moves vertically when photographing the first videosegment and the second video segment, such that one target object islocated on the high ground and another target object is located on thelow ground in the spliced picture), before the picture splicing isperformed on the first target video frame and the current video frame ofthe second video segment, picture alignment processing may further beperformed.

According to some embodiments, before operation 14 that the picturesplicing is performed on the first target video frame and the currentvideo frame of the second video segment according to the first imageregion and the second image region, the method may further include thefollowing operations: a target frame in the first video segment isdetermined as a reference frame; picture alignment processing isperformed on the first target video frame and the reference frame;and/or the picture alignment processing is performed on the currentvideo frame of the second video segment and the reference frame. Thetarget frame may be any frame in the first video segment. For example,the target frame may be a first video frame in the first video segment.

The picture alignment processing is performed on the first target videoframe and the reference frame, and/or the picture alignment processingis performed on the current video frame of the second video segment andthe reference frame. According to some embodiments, the picturealignment processing may include the following operations: targetbackground feature points each having a same background feature in thereference frame and in a specified video frame are acquired from amongbackground feature points of the reference frame and of the specifiedvideo frame; and the specified video frame is aligned to the referenceframe according to the target background feature points. The specifiedvideo frame is one of the first target video frame or the current videoframe of the second video segment.

In the embodiment, the first video segment and the second video segmentare generally photographed in the same environment; and positions andstates of the target objects in the first and second video segments maychange over time, and the background other than the target objects inthe picture tends to be static, thus the picture alignment may beperformed by taking the background in the video frame as the reference.

With the feature extraction algorithm, the background feature points ofthe reference frame can be extracted, and the background feature pointsof the specified video frame can be extracted; and based on respectivebackground feature points of the two frames, the feature points eachhaving a same background feature in two frames can be determined toserve as the target background feature points. In an embodiment, themanner for extracting the feature points with the feature extractionalgorithm is a conventional method, which may be the common technicalmeans in the field.

According to the positions of the target background feature points inthe reference frame, and the positions of the target background featurepoints in the specified video frame, a transformation matrix forrepresenting position transformation of the target feature points in thepicture may be obtained. For example, the current common least squaremethod may be used to obtain the transformation matrix. Also forexample, the transformation matrix may be a 3*3 matrix.

As described above, the specified video frame is one of the first targetvideo frame or the current video frame of the second video segment. Thefirst video frame may be aligned to the reference frame based on thetransformation matrix between the first target video frame and thereference frame, and/or the current video of the second video segmentmay be aligned to the reference frame based on the transformation matrixbetween the current video frame of the second video segment and thecurrent frame. In this way, it can be ensured that when the picturesplicing is performed on the first target video frame and the currentvideo frame of the second video segment, the pictures may be aligned toeach other as much as possible, and the picture in each part of thefirst video frame is consistent in structure, thereby improving thequality of picture of the first video frame and making the picture moreharmonious visually.

In some embodiments, the method may further include the followingoperations: a target video segment is generated based on the first videosegment; the target video segment is displayed in the video previewinterface.

In an embodiment, multiple video frames including the first video framemay be composited to form the target video segment; and the target videosegment is displayed in the video preview interface. Other video framesin the target video segment may be directly from the first video segmentor directly from the second video segment, or, other video frames in thetarget video segment may be video frames generated in a manner same asthe first video frame and having the clone effect.

For example, with the advancement of the time, e.g., from the firstframe of the video segment to the last frame of the video segment, thecurrent video frame of the second video segment changes over time; andcorrespondingly, the first target video frame corresponding to thecurrent video frame of the second video segment in terms of time alsochanges over time. In this way, whenever the current video frame of thesecond video segment is obtained, the above operation 11 to operation 14may be executed, to obtain multiple processed first video framescorrespondingly; and based on these first video frames and correspondingtime sequences of these video frames in the first video segment (i.e.,corresponding sequences in the second video segment, as described above,the first video frame is generated based on the video frames having timecorrespondences in the first video segment and the second videosegment), the target video segment may be generated.

By means of the above manner, after the processed first video frame isobtained, the target video segment may be generated directly based onthe first video frame, i.e., the video segment having the clone effectis generated.

In some embodiments, before operation 14, the method may further includethe following operation: duration alignment processing is performed onthe first video segment and the second video segment in a case where thefirst video segment is different from the second video segment induration.

As the first video segment and the second video segment are photographedrespectively, both may have different durations. In this case, beforethe picture splicing is performed on the first target video frame andthe current video frame of the second video segment, the durationalignment processing may further be performed on the first video segmentand the second video segment.

For example, the duration alignment processing may include any one ofthe following manners.

In a first manner, with a video segment having a shorter duration amongthe first and second video segments as a reference, a part of videoframes in a video segment having a longer duration among the first andsecond video segments are deleted, such that the first video segment isthe same as the second video segment in duration.

In a first manner, according to existing video frames in the videosegment having the shorter duration among the first and second videosegments, video frames are increased to the video segment having theshorter duration, such that the first video segment is the same as thesecond video segment in duration.

In the first manner, the duration alignment processing may be to deletea part of video frames in the video segment having the longer durationby taking the video segment having the shorter duration as thereference, such that the first video segment is the same as the secondvideo segment in duration.

For example, if the first video segment includes 300 video frames, andthe second video segment includes 500 video frames, the 301st to 500thvideo frames in the second video segment may be deleted from the secondvideo segment, and the front 300 frames in the second video frame arekept to serve as the second video segment used in the above processing.

In the second manner, the alignment processing may be to expand thevideo frames in the video segment having the shorter duration by takingthe video segment having the longer duration as the reference. That is,according to existing video frames in the video segment having theshorter duration among the first and second video segments, the videoframes are increased to the video segment having the shorter duration,such that the first video segment is the same as the second videosegment in duration.

The expansion may be implemented circularly or reciprocally based on theexisting video frames in the video segment having the shorter duration.

For example, it is assumed that the first video segment includes 300video frames, the second video segment includes 200 video frames, andthe video frames in the second video segment are sequentially numberedas u1 to u200. If the circular manner is used for expansion, the secondvideo segment may be expanded as u1, u2, u3, u199, u200, u1, u2, u3,u100. If the reciprocal manner is used for expansion, the second videosegment may be expanded as u1, u2, u3, u199, u200, u199, u198, u100.

In some embodiments, the method may further include the followingoperations: a photographing parameter when the image capturing apparatuscaptures the first video segment is acquired; and the image capturingapparatus is controlled to capture an image according to thephotographing parameter to obtain the second video segment.

When the second video segment is photographed by the image capturingapparatus, the photographing parameter when the image capturingapparatus photographs the first video segment may be directly used. Forexample, when the image capturing apparatus starts to photograph thefirst video segment, the photographing parameter of the image capturingapparatus may be locked; and thus, when the second video segment isphotographed, the image capturing apparatus may photograph the secondvideo segment automatically based on the photographing parameterconsistent with the first video segment.

The photographing parameter of the image capturing apparatus mayinclude, but not limited to, at least one of the followings: ISO,exposure time, a focusing distance and a white balance parameter.

By means of the above manner, when the second video segment isphotographed, the photographing parameter corresponding to the firstvideo segment can be used automatically, such that the manual adjustmentof the user turns out to be unnecessary, and the problem of tedioussetting of the photographing parameter can be solved; and additionally,by photographing the first video segment and the second video segmentwith the same photographing parameter, both the first video segment andthe second video segment can further appear the similar picture to beadvantageous for the subsequent video processing.

Embodiments of the present disclosure may further provide a splicepreview function in real time based on a photographed content of thefirst video segment. In some embodiments, the method according to thepresent disclosure may further include the following operations, asshown in FIG. 4.

In operation 41, a video stream captured by an image capturing apparatusis acquired in real time.

In operation 42, for a current video frame of the video stream, a thirdimage region corresponding to the target object in a second target videoframe is acquired.

In operation 43, the third image region is added to the current videoframe of the video stream to obtain a processed second video frame.

In operation 44, the second video frame is displayed in a video previewinterface.

The video stream may be considered as a video frame transmitted in realtime. With the terminal device for example, the terminal device canacquire a series of video frames in real time by a viewing frame of theimage capturing apparatus; the series of video frames form the videostream; and the video frame that can be acquired at present is thecurrent video frame of the video stream. As described in operation 41,the terminal device can acquire, in real time, the video stream capturedby the image capturing apparatus.

In operation 42, for the current video frame of the video stream, theterminal device acquires the third image region corresponding to thetarget object in the second target video frame.

The second target video frame is a video frame corresponding to thecurrent video frame of the video stream in the first video segment interms of time. It is to be noted that the time correspondence hereindoes not refer to that the timestamps are consistent but the first videosegment and the video stream are in a corresponding relationship in timesequence. The corresponding relationship may be that a Kth video frameof the first video segment corresponds to an Ith video frame of thevideo stream, and both the K and the I may be the same.

After the target object is identified in operation 11 (for example,through the target object identification model), the position of thetarget object in some video frame can be positioned, and an image regioncorresponding to the target object in the video frame can be extractedintactly. Therefore, according to the current video frame of the videostream, the third image region corresponding to the target object in thesecond target video frame can be acquired by identifying the targetobject. The identification of the target object has been described aboveand will not be elaborated herein. Meanwhile, the acquisition of thethird image region also has the same principle with the aboveacquisition of the first image region and no more details are elaboratedherein.

In operation 43, the third image region is added to the current videoframe of the video stream to obtain the processed second video frame.

In an embodiment, operation 43 may include the following operations: anadded position where the third image region is added in the currentvideo frame of the video stream is determined according to a positionwhere the third image region is located in the second target videoframe; and the third image region is added to the added position in thecurrent video frame of the video stream.

In an embodiment, the operation that the added position where the thirdimage region is added in the current video frame of the video stream isdetermined according to the position where the third image region islocated in the second target video frame may include the followingoperation: a position, consistent with the position where the thirdimage region is located in the second target video frame, in the currentvideo frame of the video stream is used as the added position where thethird image region is added in the current video frame of the videostream.

In other words, if the position where the third image region is locatedin the second target video frame corresponds to a position coordinateset D1, the position coordinate set D1 may be used as the added positionwhere the third image region is added in the current video frame of thevideo stream.

In another embodiment, the operation that the added position where thethird image region is added in the current video frame of the videostream is determined according to the position where the third imageregion is located in the second target video frame may include thefollowing operations: first background feature points within a presetproximity range of the third image region are acquired from amongbackground feature points of the second target video frame; secondbackground feature points each having a same background feature with arespective one of the first background feature points are determinedfrom among background feature points of the current video frame of thevideo stream; and the added position is determined according topositions where the second background feature points are located in thecurrent video frame of the video stream.

The background feature points of the second target video frame can beextracted by the above feature extraction algorithm. The firstbackground feature points within the preset proximity range of the thirdimage region may be determined according to the positions where thebackground feature points of the second target video frame are locatedin the second target video frame, and in combination with the positionwhere the third image region is located in the second target videoframe.

Correspondingly, the background feature points of the current videoframe of the video stream can be extracted by the above featureextraction algorithm; and thus, the second background feature pointseach having a same background feature with a respective one of the firstbackground feature points may be determined from among backgroundfeature points of the current video frame of the video stream.

The added position is determined according to the positions where thesecond background feature points are located in the current video frameof the video stream, and the added position is a position surrounding bythe second background feature points.

Additionally, the first video segment and the video stream may not becompletely consistent in photographing angle and photographing method toresult in position transformation of the picture between the video frameof the first video segment and the video frame of the video stream. Therelated position transformation may include, but not limited to, atleast one of the followings: translation, rotation, stretch, zoom-in,zoom-out and distortion. As a consequence, in order to make thepreviewed picture content more harmonious, and avoid the overlargeposition difference between the target objects on the same picture inthe preview interface (for example, the target object is located on theground during photographing; and as the terminal device moves verticallywhen photographing the first video segment and the video stream, suchthat one target object is located on the high ground and another targetobject is located on the low ground in the spliced picture), before thethird image region is added to the current video frame of the videostream, the picture alignment processing may further be performed.

In some embodiments, before the operation that the third image region isadded to the current video frame of the video stream, the method mayfurther include the following operation: the picture alignmentprocessing is performed on the second target video frame and the currentvideo frame of the video stream.

For example, the picture alignment processing may include the followingoperations: third background feature points having a same backgroundfeature are acquired from the background feature points of the secondtarget video frame and the current video frame of the video stream; andthe second target video frame is aligned to the current video frame ofthe video stream according to the third background feature points.

In the embodiment, the first video segment and the video stream aregenerally photographed in the same environment; and positions and statesof the target objects in the first video segment and the video streammay change over time, and the background other than the target objectsin the picture tends to be static, thus the picture alignment may beperformed by taking the background in the video frame as the reference.

With the feature extraction algorithm, the background feature points ofthe second target video frame can be extracted, and the backgroundfeature points of the current video frame of the video stream can beextracted; and based on respective background feature points of the twoframes, the feature points each having a same background feature in twoframes can be determined to serve as the third background featurepoints.

According to the positions of the third background feature points in thesecond target video frame, and the positions of the target backgroundfeature points in the current video frame of the video stream, atransformation matrix for representing position transformation of thetarget feature points in the picture may be obtained. The second targetvideo frame may be aligned to the current video frame of the videostream based on the transformation matrix.

In this way, it can be ensured that when the third image region is addedto the current video frame of the video stream, the pictures may bealigned to each other as much as possible, and the picture in each partof the real-time preview interface is consistent in structure, therebyimproving the quality of picture of the second video frame and makingthe picture more harmonious visually.

As described above, in operation 44, the second video frame is displayedin the video preview interface.

For example, after the second video frame is obtained, the second videoframe is displayed in the video preview interface.

It is to be noted that when the current video frame of the video streamis previewed in real time, the attention is not paid to whether thevideo stream has the target object or the position of the target object;instead, only the third image region (the region corresponding to thetarget object in the second target video frame) is covered to thecurrent video frame of the video stream to provide the preview for theuser, such that the user can preview, in real time, the picture havingthe splicing effect, and then confirms the picture effect of thephotographed content and determines the time for photographing thesecond video segment.

Accordingly, the display effect of adding the target object in the firstvideo segment to the current picture can be viewed in real time based onthe real-time preview function, thereby helping the user confirm thedisplay effect of the photographed content; and therefore, the secondvideo segment is recorded more accurately and the target object in thesecond video segment is located at a desirable position of the user.

FIG. 5A-FIG. 5C are schematic diagrams of an interface of a terminaldevice in an implementation process of the above described method forprocessing a video.

FIG. 5A illustrates a display of an interface of a terminal device whenthe first video segment is photographed. P1 is the target object in thefirst video segment. As shown in FIG. 5A, when the first video segmentis photographed, the target object is located on the left of thepicture. The round region on the right of the picture is thephotographing button. FIG. 5A shows a state that the photographingbutton is pressed. When the photographing button is pressed, thephotographed duration is displayed in real time in the center of theround region. As can be seen from FIG. 5A, the first video segment hasbeen photographed for 1.6 s.

FIG. 5B illustrates a display of an interface of a terminal deviceduring real-time preview after the first video segment is photographed.The round region on the right of the picture is the photographingbutton. FIG. 5B shows a state that the photographing button is notpressed. As can be seen from FIG. 5B, the second video segment is notphotographed but the video is previewed in real time. P2 is the targetobject in the video stream that is acquired by the terminal device inreal time, and can reflect the position where the target object islocated in fact. As can be seen from FIG. 5B, after the first segment ofvideo is photographed, the target object moves from the position on theleft of the picture to the position on the right of the picture.Referring to operation 41 to operation 44, P1 is the third image regioncorresponding to the target object in the second target video frame ofthe first video segment. The third image region is added to the currentvideo frame of the video stream in real time to obtain the processedsecond video frame, and displayed in the video preview interface.Therefore, the user with the handheld terminal device may preview, inreal time, what spliced picture is formed if the target object in thefirst video segment is added to the current picture, thereby it ispossible to better control photographing of the second video segment,select the appropriate time to photograph the second video segment, andavoid the coincidence of or far distance between the target objects inthe spliced video picture.

FIG. 5C illustrates a display of an interface of a terminal device whenthe second video segment is photographed. P2 is the target object in thefirst video segment. As shown in FIG. 5C, when the second video segmentis photographed, the target object is located on the left of thepicture. The round region on the right of the picture is thephotographing button. FIG. 5C shows a state that the photographingbutton is pressed. When the photographing button is pressed, thephotographed duration is displayed in real time in the center of theround region. As can be seen from FIG. 5C, the first video segment hasbeen photographed for 0.6 s. P1 is the target object in the first videosegment, and P2 is the target object in the video stream that isacquired by the terminal device in real time. The description anddisplay of P1 in FIG. 5C are similar to those in FIG. 5B, i.e., when thesecond video segment is photographed, the terminal device still acquiresthe video stream in real time and still previews the video in real time.The difference between FIG. 5C and FIG. 5B lies in whether the videostream acquired in real time is recorded. The video stream acquired inreal time in FIG. 5C is recorded as the second video segment. The videostream acquired in real time in FIG. 5B is not recorded. P1 is the thirdimage region corresponding to the target object in the second targetvideo frame of the first video segment. The third image region is addedto the current video frame of the video stream in real time to obtainthe processed second video frame, and displayed in the video previewinterface. Therefore, when photographing the second video segment, theuser with the handheld terminal device may preview, in real time, whatspliced picture is formed if the target object in the first videosegment is added to the current picture, thereby it is possible tobetter control photographing of the second video segment, and avoid thecoincidence of or far distance between the target objects in the splicedvideo picture.

In addition, when the video preview interface displays the second videoframe (previews the second video frame in real time) or the target videoframe (the video segment generated by the picture splicing), an entryfor canceling the operation may further be provided for the user. Inother words, when the video preview interface displays the second videoframe, the user may re-photograph the first video segment by cancelingthe operation, and when the video preview interface displays the targetvideo frame, the user may re-photograph the second video segment bycanceling the operation.

FIG. 6 is a block diagram of an apparatus 60 for processing a videoaccording to an exemplary embodiment, which may be applied to a terminaldevice. As shown in FIG. 6, the apparatus 60 may include: anidentification module 61 configured to identify a target object in afirst video segment; a first acquisition module 62 configured to acquirea current video frame of a second video segment; a second acquisitionmodule 63 configured to acquire a first image region corresponding tothe target object in a first target video frame of the first videosegment, and acquire a second image region corresponding to the targetobject in the current video frame of the second video segment, the firsttarget video frame corresponding to the current video frame of thesecond video segment in terms of video frame time; and a splicing module64 configured to perform picture splicing on the first target videoframe and the current video frame of the second video segment accordingto the first image region and the second image region to obtain aprocessed first video frame.

In some embodiments, the splicing module 64 includes: a firstdetermination submodule configured to determine an image splicingboundary using an image splicing algorithm according to the first imageregion and the second image region; a first acquisition submoduleconfigured to acquire, according to the image splicing boundary, a firstlocal image including the first image region from the first target videoframe, and acquire a second local image including the second imageregion from the current video frame of the second video segment; and asplicing submodule configured to splice the first local image and thesecond local image into the first video frame.

In some embodiments, the apparatus 60 further includes: a firstdetermination module configured to, before performing the picturesplicing on the first target video frame and the current video frame ofthe second video segment according to the first image region and thesecond image region, determine a target frame in the first video segmentas a reference frame; a first alignment module configured to performpicture alignment processing on the first target video frame and thereference frame; and/or a second alignment module, configured to performthe picture alignment processing on the current video frame of thesecond video segment and the reference frame.

In some embodiments, the picture alignment processing includes: targetbackground feature points each having a same background feature in thereference frame and in a specified video frame are acquired from amongbackground feature points of the reference frame and of the specifiedvideo frame, the specified video frame being one of the first targetvideo frame or the current video frame of the second video segment; andthe specified video frame is aligned to the reference frame according tothe target background feature points.

In some embodiments, the apparatus 60 further includes: a capturingmodule configured to acquire, in real time, a video stream captured byan image capturing apparatus; a third acquisition module configured toacquire, for a current video frame of the video stream, a third imageregion corresponding to the target object in a second target videoframe, the second target video frame being a video frame correspondingto the current video frame of the video stream in the first videosegment in terms of time; an addition module configured to add the thirdimage region to the current video frame of the video stream to obtain aprocessed second video frame; and a first preview module configured todisplay the second video frame in a video preview interface.

In some embodiments, the addition module includes: a seconddetermination submodule configured to determine an added position wherethe third image region is added in the current video frame of the videostream according to a position where the third image region is locatedin the second target video frame; and an addition submodule configuredto add the third image region to the added position in the current videoframe of the video stream.

In some embodiments, the second determination submodule includes: asecond acquisition submodule configured to acquire first backgroundfeature points within a preset proximity range of the third image regionfrom among background feature points of the second target video frame; athird determination submodule configured to determine second backgroundfeature points each having a same background feature with a respectiveone of the first background feature points from among background featurepoints of the current video frame of the video stream; and a fourthdetermination submodule configured to determine the added positionaccording to positions where the second background feature points arelocated in the current video frame of the video stream.

In some embodiments, the apparatus 60 further includes: a thirdalignment module configured to, before performing the picture splicingon the first target video frame and the current video frame of thesecond video segment according to the first image region and the secondimage region, in a case where the first video segment and the secondvideo segment have different durations, perform duration alignmentprocessing on the first video segment and the second video segment.

In some embodiments, the duration alignment processing includes any oneof the following manners: with a video segment having a shorter durationamong the first and second video segments as a reference, a part ofvideo frames in a video segment having a longer duration among the firstand second video segments are deleted, such that the first video segmentis the same as the second video segment in duration, or according toexisting video frames in the video segment having the shorter durationamong the first and second video segments, video frames are increased tothe video segment having the shorter duration, such that the first videosegment is the same as the second video segment in duration.

In some embodiments, the apparatus 60 further includes: a videogeneration module configured to generate a target video segment based onthe first video segment; and a second preview module configured todisplay the target video segment in the video preview interface.

For the apparatus in the foregoing embodiments, the manner of eachmodule in the apparatus performing an operation is described in themethod embodiments in detail.

FIG. 7 is a block diagram of an apparatus for processing a videoaccording to an exemplary embodiment. For example, the apparatus 700 maybe a mobile phone, a computer, a digital broadcast terminal device, amessaging device, a gaming console, a tablet, a medical device, exerciseequipment, a PDA, and the like.

Referring to FIG. 7, the apparatus 700 may include one or more of thefollowing components: a processing component 702, a memory 704, a powercomponent 706, a multimedia component 708, an audio component 710, aninput/output (I/O) interface 712, a sensor component 714, and acommunication component 716.

The processing component 702 typically controls overall operations ofthe device 700, such as the operations associated with display,telephone calls, data communications, camera operations, and recordingoperations. The processing component 702 may include one or moreprocessors 720 to execute instructions to perform all or part of theoperations in the above method for processing a video. Moreover, theprocessing component 702 may include one or more modules whichfacilitate the interaction between the processing component 702 andother components. For instance, the processing component 702 may includea multimedia module to facilitate the interaction between the multimediacomponent 708 and the processing component 702.

The memory 704 is configured to store various types of data to supportthe operation of the apparatus 700. Examples of such data includeinstructions for any applications or methods operated on the apparatus700, contact data, phonebook data, messages, pictures, video, etc. Thememory 704 may be implemented using any type of volatile or non-volatilememory devices, or a combination thereof, such as a static random accessmemory (SRAM), an electrically erasable programmable read-only memory(EEPROM), an erasable programmable read-only memory (EPROM), aprogrammable read-only memory (PROM), a read-only memory (ROM), amagnetic memory, a flash memory, a magnetic or optical disk.

The power component 706 provides power to various components of theapparatus 700. The power component 706 may include a power managementsystem, one or more power sources, and any other components associatedwith the generation, management, and distribution of power in theapparatus 700.

The multimedia component 708 includes a screen providing an outputinterface between the apparatus 700 and the user. In some embodiments,the screen may include a liquid crystal display (LCD) and a touch panel(TP). If the screen includes the touch panel, the screen may beimplemented as a touch screen to receive input signals from the user.The touch panel includes one or more touch sensors to sense touches,swipes, and gestures on the touch panel. The touch sensors may not onlysense a boundary of a touch or swipe action, but also sense a period oftime and a pressure associated with the touch or swipe action. In someembodiments, the multimedia component 808 includes a front camera and/ora rear camera. The front camera and/or the rear camera may receiveexternal multimedia data when the apparatus 700 is in an operation mode,such as a photographing mode or a video mode. Each of the front cameraand the rear camera may be a fixed optical lens system or have focus andoptical zoom capability.

The audio component 710 is configured to output and/or input audiosignals. For example, the audio component 710 includes a microphone(MIC) configured to receive an external audio signal when the apparatus700 is in an operation mode, such as a call mode, a recording mode, anda voice recognition mode. The received audio signal may further bestored in the memory 704 or transmitted via the communication component716. In some embodiments, the audio component 710 further includes aspeaker configured to output audio signals.

The I/O interface 712 provides an interface between the processingcomponent 702 and peripheral interface modules. The peripheral interfacemodules may be a keyboard, a click wheel, buttons, and the like. Thebuttons may include, but are not limited to, a home button, a volumebutton, a starting button, and a locking button.

The sensor component 714 includes one or more sensors to provide statusassessments of various aspects of the apparatus 700. For instance, thesensor component 714 may detect an on/off status of the apparatus 700and relative positioning of components, such as a display and smallkeyboard of the apparatus 700, and the sensor component 714 may furtherdetect a change in a position of the apparatus 700 or a component of theapparatus 700, presence or absence of contact between the user and theapparatus 700, orientation or acceleration/deceleration of the apparatus700 and a change in temperature of the apparatus 700. The sensorcomponent 714 may include a proximity sensor, configured to detect thepresence of nearby objects without any physical contact. The sensorcomponent 714 may also include a light sensor, such as a ComplementaryMetal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) imagesensor, configured for use in an imaging application. In someembodiments, the sensor component 714 may also include an accelerometersensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or atemperature sensor.

The communication component 716 is configured to facilitatecommunication, wired or wirelessly, between the apparatus 700 and otherdevices. The apparatus 700 may access a communication-standard-basedwireless network, such as a Wireless Fidelity (WiFi) network, a4th-Generation (4G) or 5th-Generation (5G) network or a combinationthereof. In one exemplary embodiment, the communication component 716receives a broadcast signal or broadcast associated information from anexternal broadcast management system via a broadcast channel. In oneexemplary embodiment, the communication component 716 further includes anear field communication (NFC) module to facilitate short-rangecommunications. For example, the NFC module may be implemented based ona radio frequency identification (RFID) technology, an infrared dataassociation (IrDA) technology, an ultra-wideband (UWB) technology, aBluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 700 may be implemented withone or more application specific integrated circuits (ASIC), digitalsignal processors (DSPs), digital signal processing devices (DSPDs),programmable logic devices (PLDs), field programmable gate arrays(FPGAs), controllers, micro-controllers, microprocessors, or otherelectronic components, for performing the above method for processing avideo.

In an exemplary embodiment, a non-transitory computer-readable storagemedium including an instruction, such as the memory 704 including theinstruction, is further provided; and the instruction may be executed bythe processor 720 of the apparatus 700 to complete the above method forprocessing a video. For example, the non-transitory computer-readablestorage medium may be a read only memory (ROM), a random access memory(RAM), a compact disc read-only memory (CD-ROM), a magnetic tape, afloppy disc, an optical data storage device and the like.

In another exemplary embodiment, a computer program product is furtherprovided. The computer program product includes a computer programcapable of being executed by a programmable apparatus; and the computerprogram is provided with a code portion configured to be executed by theprogrammable apparatus to implement the above method for processing avideo.

Other embodiments of the present disclosure will be apparent to thoseskilled in the art from consideration of the specification and practiceof the present disclosure disclosed here. This present disclosure isintended to cover any variations, uses, or adaptations of the presentdisclosure following the general principles thereof and including suchdepartures from the present disclosure as come within known or customarypractice in the art. It is intended that the specification and examplesbe considered as exemplary only, with a true scope and spirit of thepresent disclosure being indicated by the following claims.

It will be appreciated that the present disclosure is not limited to theexact construction that has been described above and illustrated in theaccompanying drawings, and that various modifications and changes may bemade without departing from the scope thereof. It is intended that thescope of the present disclosure only be limited by the appended claims.

What is claimed is:
 1. A method for processing a video, applied to aterminal device, the method comprising: identifying a target object in afirst video segment; acquiring a current video frame of a second videosegment; acquiring a first image region corresponding to the targetobject in a first target video frame of the first video segment, andacquiring a second image region corresponding to the target object inthe current video frame of the second video segment, wherein the firsttarget video frame corresponds to the current video frame of the secondvideo segment in terms of video frame time; and performing picturesplicing on the first target video frame and the current video frame ofthe second video segment according to the first image region and thesecond image region to obtain a processed first video frame.
 2. Themethod of claim 1, wherein performing the picture splicing on the firsttarget video frame and the current video frame of the second videosegment according to the first image region and the second image regionto obtain the processed first video frame comprises: determining animage splicing boundary according to the first image region and thesecond image region; according to the image splicing boundary, acquiringa first local image including the first image region from the firsttarget video frame, and acquiring a second local image including thesecond image region from the current video frame of the second videosegment; and splicing the first local image and the second local imageinto the first video frame.
 3. The method of claim 1, before performingthe picture splicing on the first target video frame and the currentvideo frame of the second video segment according to the first imageregion and the second image region, further comprising: determining atarget frame in the first video segment as a reference frame; andperforming one of: performing picture alignment processing on the firsttarget video frame and the reference frame; or performing the picturealignment processing on the current video frame of the second videosegment and the reference frame.
 4. The method of claim 3, wherein thepicture alignment processing comprises: acquiring, from among backgroundfeature points of the reference frame and of a specified video frame,target background feature points each having a same background featurein the reference frame and in the specified video frame, the specifiedvideo frame being one of the first target video frame or the currentvideo frame of the second video segment; and aligning the specifiedvideo frame to the reference frame according to the target backgroundfeature points.
 5. The method of claim 1, further comprising: acquiring,in real time, a video stream captured by an image capturing apparatus;for a current video frame of the video stream, acquiring a third imageregion corresponding to the target object in a second target videoframe, the second target video frame corresponding to the current videoframe of the video stream in the first video segment in terms of videoframe time; adding the third image region to the current video frame ofthe video stream to obtain a processed second video frame; anddisplaying the second video frame in a video preview interface.
 6. Themethod of claim 5, wherein adding the third image region to the currentvideo frame of the video stream comprises: determining an added positionwhere the third image region is added in the current video frame of thevideo stream according to a position where the third image region islocated in the second target video frame; and adding the third imageregion to the added position in the current video frame of the videostream.
 7. The method of claim 6, wherein determining the added positionwhere the third image region is added in the current video frame of thevideo stream according to the position where the third image region islocated in the second target video frame comprises: acquiring firstbackground feature points within a preset proximity range of the thirdimage region from among background feature points of the second targetvideo frame; determining second background feature points each having asame background feature with a respective one of the first backgroundfeature points from among background feature points of the current videoframe of the video stream; and determining the added position accordingto positions where the second background feature points are located inthe current video frame of the video stream.
 8. The method of claim 1,before performing the picture splicing on the first target video frameand the current video frame of the second video segment according to thefirst image region and the second image region, further comprising:performing duration alignment processing on the first video segment andthe second video segment in a case where the first video segment isdifferent from the second video segment in duration.
 9. The method ofclaim 8, wherein the duration alignment processing comprises one of:with a video segment having a shorter duration among the first andsecond video segments as a reference, deleting a part of video frames ina video segment having a longer duration among the first and secondvideo segments, such that the first video segment is the same as thesecond video segment in duration; or according to existing video framesin the video segment having the shorter duration among the first andsecond video segments, increasing video frames to the video segmenthaving the shorter duration, such that the first video segment is thesame as the second video segment in duration.
 10. The method of claim 1,further comprising: generating a target video segment based on the firstvideo segment; and displaying the target video segment in a videopreview interface.
 11. An apparatus for processing a video, comprising:a processor; and a memory configured to store instructions executable bythe processor, wherein the processor is configured to: identify a targetobject in a first video segment; acquire a current video frame of asecond video segment; acquire a first image region corresponding to thetarget object in a first target video frame of the first video segment,and acquire a second image region corresponding to the target object inthe current video frame of the second video segment, wherein the firsttarget video frame corresponds to the current video frame of the secondvideo segment in terms of video frame time; and perform picture splicingon the first target video frame and the current video frame of thesecond video segment according to the first image region and the secondimage region to obtain a processed first video frame.
 12. The apparatusof claim 11, wherein in performing the picture splicing on the firsttarget video frame and the current video frame of the second videosegment according to the first image region and the second image regionto obtain the processed first video frame, the processor is furtherconfigured to: determine an image splicing boundary according to thefirst image region and the second image region; according to the imagesplicing boundary, acquire a first local image including the first imageregion from the first target video frame, and acquire a second localimage including the second image region from the current video frame ofthe second video segment; and splice the first local image and thesecond local image into the first video frame.
 13. The apparatus ofclaim 11, wherein before performing the picture splicing on the firsttarget video frame and the current video frame of the second videosegment according to the first image region and the second image region,the processor is further configured to: determine a target frame in thefirst video segment as a reference frame; and perform one of: performingpicture alignment processing on the first target video frame and thereference frame; or performing the picture alignment processing on thecurrent video frame of the second video segment and the reference frame.14. The apparatus of claim 13, wherein the picture alignment processingcomprises: acquiring, from among background feature points of thereference frame and of a specified video frame, target backgroundfeature points each having a same background feature in the referenceframe and in the specified video frame, the specified video frame beingone of the first target video frame or the current video frame of thesecond video segment; and aligning the specified video frame to thereference frame according to the target background feature points. 15.The apparatus of claim 11, wherein the processor is further configuredto: acquire, in real time, a video stream captured by an image capturingapparatus; for a current video frame of the video stream, acquire athird image region corresponding to the target object in a second targetvideo frame, the second target video frame corresponding to the currentvideo frame of the video stream in the first video segment in terms oftime; add the third image region to the current video frame of the videostream to obtain a processed second video frame; and display the secondvideo frame in a video preview interface.
 16. The apparatus of claim 15,wherein in adding the third image region to the current video frame ofthe video stream, the processor is further configured to: determine anadded position where the third image region is added in the currentvideo frame of the video stream according to a position where the thirdimage region is located in the second target video frame; and add thethird image region to the added position in the current video frame ofthe video stream.
 17. The apparatus of claim 16, wherein in determiningthe added position where the third image region is added in the currentvideo frame of the video stream according to the position where thethird image region is located in the second target video frame, theprocessor is further configured to: acquire first background featurepoints within a preset proximity range of the third image region fromamong background feature points of the second target video frame;determine second background feature points each having a same backgroundfeature with a respective one of the first background feature pointsfrom among background feature points of the current video frame of thevideo stream; and determine the added position according to positionswhere the second background feature points are located in the currentvideo frame of the video stream.
 18. The apparatus of claim 11, whereinbefore performing the picture splicing on the first target video frameand the current video frame of the second video segment according to thefirst image region and the second image region, the processor is furtherconfigured to: perform duration alignment processing on the first videosegment and the second video segment in a case where the first videosegment is different from the second video segment in duration.
 19. Theapparatus of claim 18, wherein the duration alignment processingcomprises one of: with a video segment having a shorter duration amongthe first and second video segments as a reference, deleting a part ofvideo frames in a video segment having a longer duration among the firstand second video segments, such that the first video segment is the sameas the second video segment in duration; or according to existing videoframes in the video segment having the shorter duration among the firstand second video segments, increasing video frames to the video segmenthaving the shorter duration, such that the first video segment is thesame as the second video segment in duration.
 20. A non-transitorycomputer-readable storage medium having stored thereon instructionsthat, when executed by a processor of a device, cause the device toperform a method for processing a video, the method comprising:identifying a target object in a first video segment; acquiring acurrent video frame of a second video segment; acquiring a first imageregion corresponding to the target object in a first target video frameof the first video segment, and acquiring a second image regioncorresponding to the target object in the current video frame of thesecond video segment, wherein the first target video frame correspondsto the current video frame of the second video segment in terms of videoframe time; and performing picture splicing on the first target videoframe and the current video frame of the second video segment accordingto the first image region and the second image region to obtain aprocessed first video frame.